Unsupervised Improvement of Morphological Analyzer for Inflectionally Rich Languages

نویسندگان

  • Akshar Bharati
  • Rajeev Sangal
  • Sushma Bendre
  • Pavan Kumar
  • Aishwarya
چکیده

This paper presents an algorithm for unsupervised learning of morphological analysis and generation of in ectionally rich languages like Hindi, given a low coverage morph and a corpus of raw text. It assumes no particular theoretical model of morph, but can work with any morph that de nes classes of stem that behave similarly. The morph learning algorithm uses the concept of 'observable paradigm' . The results of the algorithm are encouraging with the coverage of a primitive morph going up from 32% to about 63% and that of an advanced morph going up from 96% to about 97%.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Low-Resource Active Learning of Morphological Segmentation

Many Uralic languages have a rich morphological structure, but lack morphological analysis tools needed for efficient language processing. While creating a high-quality morphological analyzer requires a significant amount of expert labor, data-driven approaches may provide sufficient quality for many applications. We study how to create a statistical model for morphological segmentation with a ...

متن کامل

Machine Learning of Morphosyntactic Structure: Lemmatizing Unknown Slovene Words

Automatic lemmatization is a core application for many language processing tasks. In inflectionally rich languages, such as Slovene, assigning the correct lemma (base form) to each word in a running text is not trivial, since for instance, nouns inflect for number and case, with a complex configuration of endings and stem modifications. The problem is especially difficult for unknown words, sin...

متن کامل

morphogen: Translation into Morphologically Rich Languages with Synthetic Phrases

Wepresent morphogen, a tool for improving translation intomorphologically rich languages with synthetic phrases. We approach the problem of translating into morphologically rich languages in two phases. First, an inflection model is learned to predict target word inflections from source side context. Then this model is used to create additional sentence specific translation phrases. These “synt...

متن کامل

Translating into Morphologically Rich Languages with Synthetic Phrases

Translation into morphologically rich languages is an important but recalcitrant problem in MT. We present a simple and effective approach that deals with the problem in two phases. First, a discriminative model is learned to predict inflections of target words from rich source-side annotations. Then, this model is used to create additional sentencespecific wordand phrase-level translations tha...

متن کامل

Unsupervised Morphology Rivals Supervised Morphology for Arabic MT

If unsupervised morphological analyzers could approach the effectiveness of supervised ones, they would be a very attractive choice for improving MT performance on low-resource inflected languages. In this paper, we compare performance gains for state-of-the-art supervised vs. unsupervised morphological analyzers, using a state-of-theart Arabic-to-English MT system. We apply maximum marginal de...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001